在计算机视觉中学习快速和判别的补丁描述是一个具有挑战性的话题。最近,许多现有的作品通过最大程度地减少三胞胎损失(或其变体)来培训各种描述符学习网络,这有望降低每对的距离之间的距离并增加每对负对之间的距离。但是,由于网络优化器与本地解决方案的不完美收敛性,必须降低这种期望。解决这个问题和开放的计算速度问题,我们为本地描述符学习(称为Desdis)提出了一个描述剂蒸馏框架,该框架称为Desdis,其中学生模型从预先训练的教师模型中获得了知识,并通过设计的教师学生的规律规则来进一步增强。 。这个教师学生的正规化程序是为了限制教师模型的正(也是负)相似性与学生模型的相似性之间的差异,并且从理论上讲,我们可以证明可以通过最大程度地减少加权组合来培训更有效的学生模型三胞胎损失和这个正常工作,而不是通过单独使三胞胎损失最小化的老师。在拟议的desdis下,许多现有的描述符网络可以嵌入为教师模型,因此,可以得出同等重量和轻巧的学生模型,这可以以准确的或速度的速度优于他们的老师。 3个公共数据集的实验结果表明,通过利用三个典型的描述符学习网络作为教师模型,从拟议的DESDIS框架中得出的均等学生模型可以比其教师和其他几种比较方法取得更好的表现。此外,在类似的贴片验证性能下,派生的轻重量模型可以达到8次甚至更快的速度
translated by 谷歌翻译
自我监督的单眼深度估计最近在计算机视觉上受到了很多关注。文献中的大多数现有作品聚集了多尺度特征,以通过直接的串联或元素添加来进行深度预测,但是,这种特征聚合操作通常忽略了多尺度特征之间的上下文一致性。在解决这个问题时,我们提出了同时汇总一对低规模和高尺度功能并保持其上下文一致性的自底功能聚合(SDFA)模块。 SDFA分别使用三个分支来学习三个功能偏移映射:一个用于完善输入低尺度功能的偏移映射,另外两个用于在设计的自我验证方式下完善输入高尺度功能。然后,我们提出了一个基于SDFA的网络,用于自我监督的单眼深度估计,并设计一种自缩训练策略,以使用SDFA模块训练拟议的网络。 KITTI数据集的实验结果表明,在大多数情况下,所提出的方法优于比较最新方法。该代码可在https://github.com/zm-zhou/sdfa-net_pytorch上找到。
translated by 谷歌翻译
开放式识别(OSR)的目的是同时检测未知类别的样本并分类已知的级别样本。大多数现有的OSR方法是归纳方法,通常遭受域移位问题的困扰,从已知类别域中学习的模型可能不适合不知名的类域。解决这个问题的启发,受到跨导性学习在许多其他视觉任务中减轻域转移问题的成功的启发,我们提出了一个迭代的转移性OSR框架,称为IT-OSR,该框架的实现了三个探索的模块,包括一个可靠性采样模块,A功能生成模块和基线更新模块。具体而言,在每次迭代中,在探索的可靠性采样模块中介绍了双空间一致的采样方法,用于根据基线方法分配的伪标签从测试样本中选择一些相对可靠的采样模块,这可能是任意的敏感性OSRR方法。然后,在正交编码条件下设计的有条件的双对逆向生成网络在特征生成模块中设计,以根据所选的测试样品和伪标签生成已知类和未知类别的判别样品特征。最后,通过共同利用生成的功能,带有伪标签的选定测试样品和训练样本,对基线更新模块中的样本进行了重新预测进行了更新。标准数据集和交叉数据集设置的广泛实验结果表明,通过将两种典型的电感OSR方法引入所提出的IT-OSR框架中,派生的转导方法比15种最先进的方法更好地执行了更好的性能。在大多数情况下。
translated by 谷歌翻译
最近,以自我监督的方式从单个图像中学习场景深度,最近受到了很多关注,旨在从单一图像中学习场景深度。尽管最近在这一领域做出了努力,但如何学习准确的场景深度并减轻闭塞对自我监督深度估计的负面影响仍然是一个空旷的问题。在解决这个问题时,我们首先凭经验分析了连续和离散深度约束的影响,这些约束在许多现有作品的培训过程中广泛使用。然后受到上述经验分析的启发,我们提出了一个新型网络,以学习一个自我监督的单眼深度估计,称为ocfd-net的咬合意识到的粗到细深度图。给定任意训练的立体声图像对,提议的OCFD-NET不仅在学习粗级深度图上采用离散的深度约束,而且还采用连续的深度约束来学习场景深度残差,从而导致罚款。 - 级别的深度图。此外,在建议的OCFD-NET下设计了一个遮挡感知模块,该模块能够提高学习闭塞的精细级别深度图的能力。 Kitti的实验结果表明,在大多数情况下,所提出的方法在七个常用指标下的比较最先进方法优于比较的最先进方法。此外,对Make3D的实验结果证明了该方法在四个常用指标下的跨数据集泛化能力方面的有效性。该代码可在https://github.com/zm-zhou/ocfd-net_pytorch上找到。
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.
translated by 谷歌翻译
Despite significant progress in object categorization, in recent years, a number of important challenges remain; mainly, the ability to learn from limited labeled data and to recognize object classes within large, potentially open, set of labels. Zero-shot learning is one way of addressing these challenges, but it has only been shown to work with limited sized class vocabularies and typically requires separation between supervised and unsupervised classes, allowing former to inform the latter but not vice versa. We propose the notion of vocabulary-informed learning to alleviate the above mentioned challenges and address problems of supervised, zero-shot, generalized zero-shot and open set recognition using a unified framework. Specifically, we propose a weighted maximum margin framework for semantic manifold-based recognition that incorporates distance constraints from (both supervised and unsupervised) vocabulary atoms. Distance constraints ensure that labeled samples are projected closer to their correct prototypes, in the embedding space, than to others. We illustrate that resulting model shows improvements in supervised, zero-shot, generalized zero-shot, and large open set recognition, with up to 310K class vocabulary on Animal with Attributes and ImageNet datasets.
translated by 谷歌翻译
Advances in computer vision and machine learning techniques have led to significant development in 2D and 3D human pose estimation from RGB cameras, LiDAR, and radars. However, human pose estimation from images is adversely affected by occlusion and lighting, which are common in many scenarios of interest. Radar and LiDAR technologies, on the other hand, need specialized hardware that is expensive and power-intensive. Furthermore, placing these sensors in non-public areas raises significant privacy concerns. To address these limitations, recent research has explored the use of WiFi antennas (1D sensors) for body segmentation and key-point body detection. This paper further expands on the use of the WiFi signal in combination with deep learning architectures, commonly used in computer vision, to estimate dense human pose correspondence. We developed a deep neural network that maps the phase and amplitude of WiFi signals to UV coordinates within 24 human regions. The results of the study reveal that our model can estimate the dense pose of multiple subjects, with comparable performance to image-based approaches, by utilizing WiFi signals as the only input. This paves the way for low-cost, broadly accessible, and privacy-preserving algorithms for human sensing.
translated by 谷歌翻译
With the increasing ability of large language models (LLMs), in-context learning (ICL) has become a new paradigm for natural language processing (NLP), where LLMs make predictions only based on contexts augmented with a few training examples. It has been a new trend exploring ICL to evaluate and extrapolate the ability of LLMs. In this paper, we aim to survey and summarize the progress, challenges, and future work in ICL. We first present a formal definition of ICL and clarify its correlation to related studies. Then, we organize and discuss advanced techniques of ICL, including training strategies, prompting strategies, and so on. Finally, we present the challenges of ICL and provide potential directions for further research. We hope our work can encourage more research on uncovering how ICL works and improving ICL in future work.
translated by 谷歌翻译
Designing better deep networks and better reinforcement learning (RL) algorithms are both important for deep RL. This work focuses on the former. Previous methods build the network with several modules like CNN, LSTM and Attention. Recent methods combine the Transformer with these modules for better performance. However, it requires tedious optimization skills to train a network composed of mixed modules, making these methods inconvenient to be used in practice. In this paper, we propose to design \emph{pure Transformer-based networks} for deep RL, aiming at providing off-the-shelf backbones for both the online and offline settings. Specifically, the Transformer in Transformer (TIT) backbone is proposed, which cascades two Transformers in a very natural way: the inner one is used to process a single observation, while the outer one is responsible for processing the observation history; combining both is expected to extract spatial-temporal representations for good decision-making. Experiments show that TIT can achieve satisfactory performance in different settings, consistently.
translated by 谷歌翻译
Recently the deep learning has shown its advantage in representation learning and clustering for time series data. Despite the considerable progress, the existing deep time series clustering approaches mostly seek to train the deep neural network by some instance reconstruction based or cluster distribution based objective, which, however, lack the ability to exploit the sample-wise (or augmentation-wise) contrastive information or even the higher-level (e.g., cluster-level) contrastiveness for learning discriminative and clustering-friendly representations. In light of this, this paper presents a deep temporal contrastive clustering (DTCC) approach, which for the first time, to our knowledge, incorporates the contrastive learning paradigm into the deep time series clustering research. Specifically, with two parallel views generated from the original time series and their augmentations, we utilize two identical auto-encoders to learn the corresponding representations, and in the meantime perform the cluster distribution learning by incorporating a k-means objective. Further, two levels of contrastive learning are simultaneously enforced to capture the instance-level and cluster-level contrastive information, respectively. With the reconstruction loss of the auto-encoder, the cluster distribution loss, and the two levels of contrastive losses jointly optimized, the network architecture is trained in a self-supervised manner and the clustering result can thereby be obtained. Experiments on a variety of time series datasets demonstrate the superiority of our DTCC approach over the state-of-the-art.
translated by 谷歌翻译